Aligning Protein Sequences with Predicted Secondary Structure
نویسندگان
چکیده
Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequences annotated with predicted secondary structure: (1) more accurate models for scoring alignments, (2) efficient algorithms for optimal alignment under these models, and (3) improved learning criteria for setting model parameters through inverse alignment, as well as (4) in-depth experiments evaluating model variants on benchmark alignments. More specifically, the new models use secondary structure predictions and their confidences to modify the scoring of both substitutions and gaps. All models have efficient algorithms for optimal pairwise alignment that run in near-quadratic time. These models have many parameters, which are rigorously learned using inverse alignment under a new criterion that carefully balances score error and recovery error. We then evaluate these models by studying how accurately an optimal alignment under the model recovers benchmark reference alignments that are based on the known three-dimensional structures of the proteins. The experiments show that these new models provide a significant boost in accuracy over the standard model for distant sequences. The improvement for pairwise alignment is as much as 15% for sequences with less than 25% identity, while for multiple alignment the improvement is more than 20% for difficult benchmarks whose accuracy under standard tools is at most 40%.
منابع مشابه
Learning Models for Aligning Protein Sequences with Predicted Secondary Structure
Accurately aligning distant protein sequences is notoriously difficult. A recent approach to improving alignment accuracy is to use additional information such as predicted secondary structure. We introduce several new models for scoring alignments of protein sequences with predicted secondary structure, which use the predictions and their confidences to modify both the substitution and gap cos...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملIsolation and characterization of Phi class glutathione transferase partial gene from Iranian barley
Glutathione transferases are multifunctional proteins involved in several diverse intracellular events such as primary and secondary metabolisms, signaling and stress metabolism. These enzymes have been subdivided into eight classes in plants. The Phi class, being plant specific, is the most represented. In the present study, based on the sequences available at GenBank, different primers were d...
متن کاملDAFS: simultaneous aligning and folding of RNA sequences via dual decomposition
MOTIVATION It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA ...
متن کاملORFeus: detection of distant homology using sequence profiles and predicted secondary structure
ORFeus is a fully automated, sensitive protein sequence similarity search server available to the academic community via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/). The goal of the development of ORFeus was to increase the sensitivity of the detection of distantly related protein families. Predicted secondary structure information was added to the information about sequence ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 17 3 شماره
صفحات -
تاریخ انتشار 2010